Multilingual corpora for speech-to-speech translation research

نویسندگان

  • Gen-ichiro Kikui
  • Toshiyuki Takezawa
  • Seiichi Yamamoto
چکیده

Multilingual spoken language corpora are indispensable for developing new speech-to-speech machine translation (S2SMT) technologies. This paper first discusses characteristics that corpora for S2SMT should have, then surveys existing corpora. Finally, it compares these corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of the Multilingual LUNA Corpus for Spoken Language System Porting

The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we addr...

متن کامل

Linguistic representation of Finnish in a limited domain speech-to-speech translation system

This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Source medical domain speech-to-speech translation system. The paper describes the collection of the medical sub-domain corpora for Finnish, the creation of the Finnish generation grammar by adapting the original English grammar, the composition of the domain specific Finnish lexicon and the definiti...

متن کامل

Proceedings of Meetings on Acoustics

India possesses a large variety of languages and dialects spoken in different parts of the country. These languages possess some unique linguistic, phonological and phonetic properties different from European languages. Research is being done in several of Indian languages such as Hindi, Bangla, etc. to study the articulatory, acoustic, Phonetic and prosodic nature for the purpose of creating s...

متن کامل

Statistical speech-to-speech translation with multilingual speech recognition and bilingual-chunk parsing

Initiated mainly from speech community, researches in speech to speech (S2S) translation have made steady progress in the past decade. Many approaches to S2S translation have been proposed continually. Among of them, corpus-dependent statistical strategies have been widely studied during recent years. In corpus-based translation methodology, rather than taking the corpus just as reference templ...

متن کامل

Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences

This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It provides a 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004